Faster Depth-Adaptive Transformers
نویسندگان
چکیده
Depth-adaptive neural networks can dynamically adjust depths according to the hardness of input words, and thus improve efficiency. The main challenge is how measure such decide required (i.e., layers) conduct. Previous works generally build a halting unit whether computation should continue or stop at each layer. As there no specific supervision depth selection, may be under-optimized inaccurate, which results in suboptimal unstable performance when modeling sentences. In this paper, we get rid estimate advance, yields faster depth-adaptive model. Specifically, two approaches are proposed explicitly words corresponding adaptive depth, namely 1) mutual information (MI) based estimation 2) reconstruction loss estimation. We conduct experiments on text classification task with 24 datasets various sizes domains. Results confirm that our speed up vanilla Transformer (up 7x) while preserving high accuracy. Moreover, efficiency robustness significantly improved compared other approaches.
منابع مشابه
Computing Tree-Depth Faster Than 2 n
A connected graph has tree-depth at most k if it is a subgraph of the closure of a rooted tree whose height is at most k. We give an algorithm which for a given n-vertex graph G, in time O(1.9602) computes the tree-depth of G. Our algorithm is based on combinatorial results revealing the structure of minimal rooted trees whose closures contain G.
متن کاملMonad Transformers as Monoid Transformers
The incremental approach to modular monadic semantics constructs complex monads by using monad transformers to add computational features to a preexisting monad. A complication of this approach is that the operations associated to the pre-existing monad need to be lifted to the new monad. In a companion paper by Jaskelioff, the lifting problem has been addressed in the setting of system Fω. Her...
متن کاملFaster Coordinate Descent via Adaptive Importance Sampling
Coordinate descent methods employ random partial updates of decision variables in order to solve huge-scale convex optimization problems. In this work, we introduce new adaptive rules for the random selection of their updates. By adaptive, we mean that our selection rules are based on the dual residual or the primal-dual gap estimates and can change at each iteration. We theoretically character...
متن کاملFaster Adaptive Set Intersections for Text Searching
The intersection of large ordered sets is a common problem in the context of the evaluation of boolean queries to a search engine. In this paper we engineer a better algorithm for this task, which improves over those proposed by Demaine, Munro and López-Ortiz [SODA 2000/ALENEX 2001], by using a variant of interpolation search. More specifically, our contributions are threefold. First, we corrob...
متن کاملFaster Ray Tracing Using Adaptive Grids
fficient ray tracing has been a contradiction in terms since its introduction as a computer version of ideas found in Dürer's Underweysung der Messung (1525) and Descartes' La Dioptrique (1637). The most effective acceleration techniques developed to reduce ray tracing's high computational cost are based on space coherence: bounding box hierarchies and space subdivision. 1 During pre-processing...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2021
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v35i15.17584